home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
- MINIX: A CHEAP UNIX CLONE WITH SOURCE CODE
-
- Andrew S. Tanenbaum
-
- Dept. of Mathematics and Computer Science
- Vrije Universiteit
- Amsterdam, The Netherlands
-
-
-
-
-
- 1. OVERVIEW OF THE MINIX SYSTEM ARCHITECTURE
- UNIX- is organized as a single executable program that
- is loaded into memory at system boot time and then run.
- MINIX is structured in a much more modular way, as a collec-
- tion of processes that communicate with each other and with
- user processes by sending and receiving messages. There are
- separate processes for the memory manager, the file system,
- for each device driver, and for certain other system func-
- tions. This structure enforces a better interface between
- the pieces. The file system cannot, for example, acciden-
- tally change the memory manager's tables because the file
- system and memory manager each have their own private
- address spaces.
- These system processes are each full-fledged processes,
- with their own memory allocation, process table entry and
- state. They can be run, blocked, and send messages, just as
- the user processes. In fact, the memory manager and file
- system each run in user space as ordinary processes. The
- device drivers are all linked together with the kernel into
- the same binary program, but they communicate with each
- other and with the other processes by message passing.
- When the system is compiled, four binary programs are
- independently created: the kernel (including the driver
- processes), the memory manager, the file system, and init
- (which reads /etc/ttys and forks off the login processes).
- In other words, compiling the system results in four dis-
- tinct a.out files. When the system is booted, all four of
- these are read into memory from the boot diskette.
- It is possible, and in fact, normal, to modify, recom-
- pile, and relink, say, the file system, without having to
- relink the other three pieces. This design provides a high
- degree of modularity by dividing the system up into indepen-
- dent pieces, each with a well-defined function and interface
- to the other pieces. The pieces communicate by sending and
- receiving messages.
- The various processes are structured in four layers:
-
- 4. The user processes (top layer).
- 3. The server processes (memory manager and file system).
- _________________________
- - UNIX is a trademark of AT&T Bell Laboratories.
-
-
-
-
-
-
-
-
-
-
-
-
-
- 2. The device drivers, one process per device.
- 1. Process and message handling (bottom layer).
-
- Let us now briefly summarize the function of each layer.
- Layer 1 is concerned with doing process management
- including CPU scheduling and interprocess communication.
- When a process does a SEND or RECEIVE, it traps to the ker-
- nel, which then tries to execute the command. If the com-
- mand cannot be executed (e.g., a process does a RECEIVE and
- there are no messages waiting for it), the caller is blocked
- until the command can be executed, at which time the process
- is reactivated. When an interrupt occurs, layer 1 converts
- it into a message to the appropriate device driver, which
- will normally be blocked waiting for it. The decision about
- which process to run when is also made in layer 1. A prior-
- ity algorithm is used, giving device drivers higher priority
- over ordinary user processes, for example.
- Layer 2 contains the device drivers, one process per
- major device. These processes are part of the kernel's
- address space because they must run in kernel mode to access
- I/O device registers and execute I/O instructions. Although
- the IBM PC does not have user mode/kernel mode, most other
- machines do, so this decision has been made with an eye
- toward the future. To distinguish the processes within the
- kernel from those in user space, the kernel processes are
- called tasks.
- Layer 3 contains only two processes, the memory manager
- and the file system. They are both structured as servers,
- with the user processes as clients. When a user process
- (i.e., a client) wants to execute a system call, it calls,
- for example, the library procedure read with the file
- descriptor, buffer, and count. The library procedure builds
- a message containing the system call number and the parame-
- ters and sends it to the file system. The client then
- blocks waiting for a reply. When the file system receives
- the message, it carries it out and sends back a reply con-
- taining the number of bytes read or the error code. The
- library procedure gets the reply and returns the result to
- the caller in the usual way. The user is completely unaware
- of what is going on here, making it easy to replace the
- local file system with a remote one.
- Layer 4 contains the user programs. When the system
- comes up, init forks off login processes, which then wait
- for input. On a successful login, the shell is executed.
- Processes can fork, resulting in a tree of processes, with
- init at the root. When CTRL-D is typed to a shell, it
- exits, and init replaces the shell with another login pro-
- cess.
-
- 2. LAYER 1 - PROCESSES AND MESSAGES
- The two basic concepts on which MINIX is built are
- processes and messages. A process is an independently
- schedulable entity with its own process table entry. A mes-
- sage is a structure containing the sender's process number,
-
-
-
-
-
-
-
-
-
- - 2 -
-
-
- a message type field, and a variable part (a union) contain-
- ing the parameters or reply codes of the message. Message
- size is fixed, depending on how big the union happens to be
- on the machine in question. On the IBM PC it is 24 bytes.
- Three kernel calls are provided:
-
- - RECEIVE(source, &message)
- - SEND(destination, &message)
- - SENDREC(process, &message)
-
- These are the only true system calls (i.e., traps to the
- kernel). RECEIVE announces the willingness of the caller to
- accept a message from a specified process, or ANY, if the
- RECEIVER will accept any message. (From here on, ``pro-
- cess'' also includes the tasks.) If no message is available,
- the receiving process is blocked. SEND attempts to transmit
- a message to the destination process. If the destination
- process is currently blocked trying to receive from the
- sender, the kernel copies the message from the sender's
- buffer to the receiver's buffer, and then marks them both as
- runnable. If the receiver is not waiting for a message from
- the sender, the sender is blocked.
- The SENDREC primitive combines the functions of the
- other two. It sends a message to the indicated process, and
- then blocks until a reply has been received. The reply
- overwrites the original message. User processes use SENDREC
- to execute system calls by sending messages to the servers
- and then blocking until the reply arrives.
- There are two ways to enter the kernel. One way is by
- the trap resulting from a process' attempt to send or
- receive a message. The other way is by an interrupt. When
- an interrupt occurs, the registers and machine state of the
- currently running process are saved in its process table
- entry. Then a general interrupt handler is called with the
- interrupt number as parameter. This procedure builds a mes-
- sage of type INTERRUPT, copies it to the buffer of the wait-
- ing task, marks that task as runnable (unblocked), and then
- calls the scheduler to see who to run next.
- The scheduler maintains three queues, corresponding to
- layers 2, 3, and 4, respectively. The driver queue has the
- highest priority, the server queue has middle priority, and
- the user queue has lowest priority. The scheduling algo-
- rithm is simple: find the highest priority queue that has at
- least one process on it, and run the first process on that
- queue. In this way, a clock interrupt will cause a process
- switch if the file system was running, but not if the disk
- driver was running. If the disk driver was running, the
- clock task will be put at the end of the highest priority
- queue, and run when its turn comes.
- In addition to this rule, once every 100 msec, the
- clock task checks to see if the current process is a user
- process that has been running for at least 100 msec. If so,
- that user is removed from the front of the user queue and
- put on the back. In effect, compute bound user processes
-
-
-
-
-
-
-
-
-
- - 3 -
-
-
- are run using a round robin scheduler. Once started, a user
- process runs until either it blocks trying to send or
- receive a message, or it has had 100 msec of CPU time. This
- algorithm is simple, fair, and easy to implement.
-
- 3. LAYER 2 - DEVICE DRIVERS
- Like all versions of UNIX for the IBM PC, MINIX does
- not use the ROM BIOS for input or output because the BIOS
- does not support interrupts. Suppose a user forks off a
- compilation in the background and then calls the editor. If
- the editor tried to read from the terminal using the BIOS,
- the compilation (and any other background jobs such as
- printing) would be stopped dead in their tracks waiting for
- the the next character to be typed. Such behavior may be
- acceptable in the MS-DOS world, but it certainly is not in
- the UNIX world. As a result, MINIX contains a complete set
- of drivers that duplicate the functions of the BIOS. Like
- the rest of MINIX, these drivers are written in C, not
- assembly language.
- This design has important implications for running
- MINIX on PC clones. A clone whose hardware is not compati-
- ble with the PC down to the chip level, but which tries to
- hide the differences by making the BIOS calls functionally
- identical to IBM's will not run an unmodified MINIX because
- MINIX does not use the BIOS.
- Each device driver is a separate process in MINIX. At
- present, the drivers include the clock driver, terminal
- driver, various disk drivers (e.g., RAM disk, floppy disk),
- and printer driver. Each driver has a main loop consisting
- of three actions:
-
- 1. Wait for an incoming message.
- 2. Perform the request contained in the message.
- 3. Send a reply message.
-
- Request messages have a standard format, containing the
- opcode (e.g., READ, WRITE, or IOCTL), the minor device
- number, the position (e.g., disk block number), the buffer
- address, the byte count, and the number of the process on
- whose behalf the work is being done.
- As an example of where device drivers fit in, consider
- what happens when a user wants to read from a file. The
- user sends a message to the file system. If the file system
- has the needed data in its buffer cache, they are copied
- back to the user. Otherwise, the file system sends a mes-
- sage to the disk task requesting that the block be read into
- a buffer within the file system's address space (in its
- cache). Users may not send messages to the tasks directly.
- Only the servers may do this.
- MINIX supports a RAM disk. In fact, the RAM disk is
- always used to hold the root device. When the system is
- booted, after the operating system has been loaded, the user
- is instructed to insert the root file system diskette. The
- file system then sees how big it is, allocates the necessary
-
-
-
-
-
-
-
-
-
- - 4 -
-
-
- memory, and copies the diskette to the RAM disk. Other file
- systems can then be mounted on the root device.
- This organization puts important directories such as
- /bin and /tmp on the fastest device, and also makes it easy
- to work with either floppy disks or hard disks or a mixture
- of the two by mounting them on /usr or /user or elsewhere.
- In any event, the root device is always in the same place.
- In the standard distribution, the RAM disk is about
- 240K, most of which is full of parts of the C compiler. In
- the 256K system, a much smaller RAM disk has to be used,
- which explains why this version has no C compiler: there is
- no place to put it. (The /usr diskette is completely full
- with the other utility programs and one of the design goals
- was to make the system run on a 256K PC with 1 floppy disk.)
- Users with an unusual configuration such as 256K and three
- hard disks are free to juggle things around as they see fit.
- The terminal driver is compatible with the standard V7
- terminal driver. It supports cooked mode, raw mode, and
- cbreak mode. It also supports several escape sequences,
- such as cursor positioning and reverse scrolling because the
- screen editor needs them.
- The printer driver copies its input to the printer
- character for character without modification. It does not
- even convert line feed to carriage return + line feed. This
- makes it possible to send escape sequences to graphics
- printers without the driver messing things up. MINIX does
- not spool output because floppy disk systems rarely have
- enough spare disk space for the spooling directory. Instead
- one normally would print a file f by saying
-
- lpr <f &
-
- to do the printing in the background. The lpr program
- insert carriage returns, expands tabs, and so on, so it
- should only be used for straight ASCII files. On hard disk
- systems, a spooler would not be difficult to write.
-
- 4. LAYER 3 - SERVERS
- Layer 3 contains two server processes: the memory
- manager and the file system. They are both structured in
- the same way as the device drivers, that is a main loop that
- accepts requests, performs them, and then replies. We will
- now look at each of these in turn.
- The memory manager's job is to handle those system
- calls that affect memory allocation, as well as a few oth-
- ers. These include FORK, EXEC, WAIT, KILL, and BRK. The
- memory model used by MINIX is exceptionally simple in order
- to accommodate computers without any memory management
- hardware. When the shell forks off a process, a copy of the
- shell is made in memory. When the child does an EXEC, the
- new core image is placed in memory. Thereafter it is never
- moved. MINIX does not swap or page.
- The amount of memory allocated to the process is deter-
- mined by a field in the header of the executable file. A
-
-
-
-
-
-
-
-
-
- - 5 -
-
-
- program, chmem, has been provided to manipulate this field.
- When a process is started, the text segment is set at the
- very bottom of the allocated memory area, followed by the
- data and bss. The stack starts at the top of the allocated
- memory and grows downward. The space between the bottom of
- the stack and the top of the data segment is available for
- both segments to grow into as needed. If the two segments
- meet, the process is killed.
- In the past, before paging was invented, all memory
- allocation schemes worked like this. In the future, when
- even small microcomputers will use 32-bit CPUs and 1M x 1
- bit memory chips, the minimum feasible memory will be 4
- megabytes and this allocation scheme will probably become
- popular again due to its inherent simplicity. Thus the
- MINIX scheme can be regarded as either hopelessly outdated
- or amazingly futuristic, as you prefer.
- The memory manager keeps track of memory using a list
- of holes. When new memory is needed, either for FORK or for
- EXEC, it searches the hole list and takes the first hole
- that is big enough (first fit). When a process terminates,
- if it is adjacent to a hole on either side, the process'
- memory and the hole are merged into a bigger hole.
- The file system is really a remote file server that
- happens to be running on the user's machine. However it is
- straightforward to convert it into a true network file
- server. All that needs to be done is change the message
- interface and provide some way of authenticating requests.
- (In MINIX, the source field in the incoming message is
- trustworthy because it is filled in by the kernel.) When
- running remote, the MINIX file server maintains state infor-
- mation, like RFS and unlike NFS.
- The MINIX file system is similar to that of V7 UNIX.
- The i-node is slightly different, containing only 9 disk
- addresses instead of 13, and only 1 time instead of 3.
- These changes reduce the i-node from 64 bytes to 32 bytes,
- to store more i-nodes per disk block and reduce the size of
- the in-core i-node table.
- Free disk blocks and free inodes are kept track of
- using bit maps rather than free lists. The bit maps for the
- root device and all mounted file systems are kept in memory.
- When a file grows, the system makes a definite effort to
- allocate the new block as close as possible to the old ones,
- to minimize arm motion. Disk storage is not necessarily
- allocated one block at a time. A minor device can be con-
- figured to allocate 2, 4 (or more) contiguous blocks when-
- ever a block is allocated. Although this wastes disk space,
- these multiblock zones improve disk performance by keeping
- file blocks close together. The standard parameters for
- MINIX as distributed are 1K blocks and 1K zones (i.e., just
- 1 block per zone).
- MINIX maintains a buffer cache of recently used blocks.
- A hashing algorithm is used to look up blocks in the cache.
- When an i-node block, directory block, or other critical
- block is modified, it is written back to disk immediately.
-
-
-
-
-
-
-
-
-
- - 6 -
-
-
- Data blocks are only written back at the next SYNC or when
- the buffer is needed for something else.
- The MINIX directory system and format is identical to
- that of V7 UNIX. File names are strings of up to 14 charac-
- ters, and directories can be arbitrarily long.
-
- 5. LAYER 4 - USER PROCESSES
- This layer contains init, the shell, the editor, the
- compiler, the utilities, and all the user processes. These
- processes may only send messages to the memory manager and
- the file system, and these servers only accept valid system
- call requests. Thus the user processes do not perceive
- MINIX to be a general-purpose message passing system. How-
- ever, removing the one line of code that checks if the mes-
- sage destination is valid would convert it into a much more
- general system (but less UNIX-like).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-